没有全面的指标来描述多对象跟踪(MOT)序列的复杂性。缺乏指标可降低解释性,使数据集的比较变得复杂,并将跟踪器绩效的对话降低到排行榜位置的问题。作为一种补救措施,我们介绍了新型的MOT数据集复杂度度量(MOTCOM),该度量是由MOT中的关键问题启发的三个子计量学的组合:闭塞,运动不稳定和视觉相似性。MOTCOM的见解可以开放有关跟踪器性能的细微讨论,并可能导致对鲜为人知的数据集或旨在解决子问题的新颖贡献的更广泛认可。我们在综合MOT17,MOT20和Motsynth数据集上评估了MOTCOM,并表明MOTCOM在描述与传统密度和轨道数量相比描述MOT序列的复杂性要好得多。项目页面https://vap.aau.dk/motcom
translated by 谷歌翻译
灵感来自HTTPS://Doi.org/10.1515/Jagi-2016-0001中呈现的“认知时间玻璃”模型,我们为开发旨在认知机器人的认知架构提出了一个新的框架。拟议框架的目的是通过鼓励和减轻合作和重复使用现有结果来缓解认知架构的发展。这是通过提出将认知架构的发展分成一系列层的框架来完成,该层可以部分地被认为是隔离的,其中一些可以与其他研究领域直接相关。最后,我们向拟议框架介绍和审查一些主题。
translated by 谷歌翻译
Accurate modeling of ship performance is crucial for the shipping industry to optimize fuel consumption and subsequently reduce emissions. However, predicting the speed-power relation in real-world conditions remains a challenge. In this study, we used in-service monitoring data from multiple vessels with different hull shapes to compare the accuracy of data-driven machine learning (ML) algorithms to traditional methods for assessing ship performance. Our analysis consists of two main parts: (1) a comparison of sea trial curves with calm-water curves fitted on operational data, and (2) a benchmark of multiple added wave resistance theories with an ML-based approach. Our results showed that a simple neural network outperformed established semi-empirical formulas following first principles. The neural network only required operational data as input, while the traditional methods required extensive ship particulars that are often unavailable. These findings suggest that data-driven algorithms may be more effective for predicting ship performance in practical applications.
translated by 谷歌翻译
Reinforcement Learning (RL) can enable agents to learn complex tasks. However, it is difficult to interpret the knowledge and reuse it across tasks. Inductive biases can address such issues by explicitly providing generic yet useful decomposition that is otherwise difficult or expensive to learn implicitly. For example, object-centered approaches decompose a high dimensional observation into individual objects. Expanding on this, we utilize an inductive bias for explicit object-centered knowledge separation that provides further decomposition into semantic representations and dynamics knowledge. For this, we introduce a semantic module that predicts an objects' semantic state based on its context. The resulting affordance-like object state can then be used to enrich perceptual object representations. With a minimal setup and an environment that enables puzzle-like tasks, we demonstrate the feasibility and benefits of this approach. Specifically, we compare three different methods of integrating semantic representations into a model-based RL architecture. Our experiments show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.
translated by 谷歌翻译
Due to the unequivocal need for understanding the decision processes of deep learning networks, both modal-dependent and model-agnostic techniques have become very popular. Although both of these ideas provide transparency for automated decision making, most methodologies focus on either using the modal-gradients (model-dependent) or ignoring the model internal states and reasoning with a model's behavior/outcome (model-agnostic) to instances. In this work, we propose a unified explanation approach that given an instance combines both model-dependent and agnostic explanations to produce an explanation set. The generated explanations are not only consistent in the neighborhood of a sample but can highlight causal relationships between image content and the outcome. We use Wireless Capsule Endoscopy (WCE) domain to illustrate the effectiveness of our explanations. The saliency maps generated by our approach are comparable or better on the softmax information score.
translated by 谷歌翻译
Dexterous manipulation with anthropomorphic robot hands remains a challenging problem in robotics because of the high-dimensional state and action spaces and complex contacts. Nevertheless, skillful closed-loop manipulation is required to enable humanoid robots to operate in unstructured real-world environments. Reinforcement learning (RL) has traditionally imposed enormous interaction data requirements for optimizing such complex control problems. We introduce a new framework that leverages recent advances in GPU-based simulation along with the strength of imitation learning in guiding policy search towards promising behaviors to make RL training feasible in these domains. To this end, we present an immersive virtual reality teleoperation interface designed for interactive human-like manipulation on contact rich tasks and a suite of manipulation environments inspired by tasks of daily living. Finally, we demonstrate the complementary strengths of massively parallel RL and imitation learning, yielding robust and natural behaviors. Videos of trained policies, our source code, and the collected demonstration datasets are available at https://maltemosbach.github.io/interactive_ human_like_manipulation/.
translated by 谷歌翻译
We present Depth-aware Image-based NEural Radiance fields (DINER). Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views. Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. In comparison to the previous state of the art, DINER achieves higher synthesis quality and can process input views with greater disparity. This allows us to capture scenes more completely without changing capturing hardware requirements and ultimately enables larger viewpoint changes during novel view synthesis. We evaluate our method by synthesizing novel views, both for human heads and for general objects, and observe significantly improved qualitative results and increased perceptual metrics compared to the previous state of the art. The code will be made publicly available for research purposes.
translated by 谷歌翻译
Purpose: This study aims to explore training strategies to improve convolutional neural network-based image-to-image registration for abdominal imaging. Methods: Different training strategies, loss functions, and transfer learning schemes were considered. Furthermore, an augmentation layer which generates artificial training image pairs on-the-fly was proposed, in addition to a loss layer that enables dynamic loss weighting. Results: Guiding registration using segmentations in the training step proved beneficial for deep-learning-based image registration. Finetuning the pretrained model from the brain MRI dataset to the abdominal CT dataset further improved performance on the latter application, removing the need for a large dataset to yield satisfactory performance. Dynamic loss weighting also marginally improved performance, all without impacting inference runtime. Conclusion: Using simple concepts, we improved the performance of a commonly used deep image registration architecture, VoxelMorph. In future work, our framework, DDMR, should be validated on different datasets to further assess its value.
translated by 谷歌翻译
Single-cell reference atlases are large-scale, cell-level maps that capture cellular heterogeneity within an organ using single cell genomics. Given their size and cellular diversity, these atlases serve as high-quality training data for the transfer of cell type labels to new datasets. Such label transfer, however, must be robust to domain shifts in gene expression due to measurement technique, lab specifics and more general batch effects. This requires methods that provide uncertainty estimates on the cell type predictions to ensure correct interpretation. Here, for the first time, we introduce uncertainty quantification methods for cell type classification on single-cell reference atlases. We benchmark four model classes and show that currently used models lack calibration, robustness, and actionable uncertainty scores. Furthermore, we demonstrate how models that quantify uncertainty are better suited to detect unseen cell types in the setting of atlas-level cell type transfer.
translated by 谷歌翻译
我们研究Claire(一种差异性多形状,多-GPU图像注册算法和软件)的性能 - 在具有数十亿素素的大规模生物医学成像应用中。在这样的分辨率下,大多数用于差异图像注册的软件包非常昂贵。结果,从业人员首先要大量删除原始图像,然后使用现有工具进行注册。我们的主要贡献是对降采样对注册性能的影响的广泛分析。我们通过将用Claire获得的全分辨率注册与合成和现实成像数据集的低分辨率注册进行比较,研究了这种影响。我们的结果表明,完全分辨率的注册可以产生卓越的注册质量 - 但并非总是如此。例如,将合成图像从$ 1024^3 $减少到$ 256^3 $将骰子系数从92%降低到79%。但是,对于嘈杂或低对比度的高分辨率图像,差异不太明显。克莱尔不仅允许我们在几秒钟内注册临床相关大小的图像,而且还可以在合理的时间内以前所未有的分辨率注册图像。考虑的最高分辨率是$ 2816 \ times3016 \ times1162 $的清晰图像。据我们所知,这是有关此类决议中图像注册质量的首次研究。
translated by 谷歌翻译